Task 1 - 10X Chromium Analysis

knitr::include_graphics("task1 images/overview.png")
knitr::include_graphics("task1 images/low coverage.png")
Figure 1. Read Coverage of 10X Chromium Plasmodium falciparum mapped reads viewed in Artemis. A) This is an overview of the mapped reads across chromosome 1 B) This is an example of low read coverage of the SURF1 gene.Figure 1. Read Coverage of 10X Chromium Plasmodium falciparum mapped reads viewed in Artemis. A) This is an overview of the mapped reads across chromosome 1 B) This is an example of low read coverage of the SURF1 gene.

Figure 1. Read Coverage of 10X Chromium Plasmodium falciparum mapped reads viewed in Artemis. A) This is an overview of the mapped reads across chromosome 1 B) This is an example of low read coverage of the SURF1 gene.

In Figure 1, we can see the alignment of 10X chromium Plasmodium falciparum reads mapped against the reference genome. These coverage plots were generated using the Artemis genome browser / annotation tool (Carver et al. 2012). In panel A), we can see the overall read coverage for chromosome 1, where most of the reads cover the span of the chromosome and only few regions of low coverage. This low coverage can be seen in panel B), where the SURF1 gene has very low read coverage. This could be due to low RNA content in the droplets , not enough cells loaded into the Chromium chip, or a lowered Fraction Reads per cell. Overall, this was a good 10X chromium run.

Task 2 - Cell Ranger QC

knitr::include_graphics("task1 images/cellranger.png")
Figure 2. Cell Ranger QC Overview

Figure 2. Cell Ranger QC Overview

In Figure 2., we can see the read QC statistics from a 10X chromium run in the Cell Ranger software. This was not a good single cell run for several reasons. Firstly, there were more than 10k cells that were run and could have led to more cell doublets per droplets. The sequencing saturation was very low (38.4%), this was probably caused by either low sequencing depth, errors with library preparation due to poor sample quality, or the presence of low abundance transcripts in the samples. This could explain why there were 5535 gene detected in total but only a median of 687 genes detected per cell.

This could be improved with better library preparation, and we should keep in mind for the analysis that a majority of the cells will have portions of the genes that will be expressed at a lower rate or not at all. This would give the potential impression that there will be greater differential expression than there should be.

Task 3 - Integration analysis with Seurat

Data Loading

# Loading libraries
library(tidyverse)
library(Seurat) # using fork:  https://github.com/scottgigante/seurat/tree/patch/add-PHATE-again
library(SeuratData)
library(harmony)
library(slingshot)
library(viridis)
library(patchwork)
library(phateR)
library(RColorBrewer)


# Reading Raw 10X data, source : McIntyre et al., 2020
KO1 <- Read10X(data.dir = 'KO1')
WT1 <- Read10X(data.dir = 'WT1')
WT2 <- Read10X(data.dir = 'WT2')
WT3 <- Read10X(data.dir = 'WT3')

# create a list of datasets
object.list <- list(KO1 = KO1, WT1 = WT1, WT2 = WT2, WT3 = WT3)

# CreateSeuratObject loop, this allows us to use the 10X chromium single cell data to be analysed in the Seurat package. Once created, they can be integrated together.
for (sample in names(object.list)) {
  # Creating a Seurat object of each 10X dataset
  object.list[[sample]] <- CreateSeuratObject(counts = object.list[[sample]],
                                           project = sample,
                                           min.cells = 3, 
                                           min.features = 0)
  
  # Adding Group samples to metadata
  object.list[[sample]]$Group <- substr(sample, 1, nchar(sample)-1)

}

QC

# Here we will assess the quality of all the datasets before integration, we want to remove doublet cells by removing droplets that contain excessively large feauture (gene) counts/ rna counts outliers. Cells that are expiring or expired, when being barcoded in the micro-fluidic device, their mitochondrial DNA makes up a higher percentage of the sequenced DNA. 

# Unfiltered

QC.plots <- lapply(X = object.list, FUN = function(x) {
  x[['percent.mito']] <- PercentageFeatureSet(x, pattern = "^mt-")
  
  VlnPlot(x, features = c("nFeature_RNA", "nCount_RNA",
                          "percent.mito"), ncol = 3,) + 
             labs(subtitle = 'unfiltered results') + 
             theme(plot.subtitle = element_text(hjust = 0.5))
})

# Filtered
QC.plots2 <- lapply(X = object.list, FUN = function(x) {
  x$percent.mito <- PercentageFeatureSet(x, pattern = "^mt-")
  x <- subset(x, subset = nFeature_RNA > 500 & nFeature_RNA < 2700 &
                          percent.mito > 2 & percent.mito < 10 &
                          nCount_RNA < 7500)
  VlnPlot(x, features = c("nFeature_RNA", "nCount_RNA",
                          "percent.mito"), ncol = 3,) + 
             labs(subtitle = 'filtered results') + 
             theme(plot.subtitle = element_text(hjust = 0.5))

})

QC.plots
## $KO1
## 
## $WT1
## 
## $WT2
## 
## $WT3
QC.plots2
## $KO1
## 
## $WT1
## 
## $WT2
## 
## $WT3
Figure 3. QC plots of feature / RNA count and Percentage Mitochondrial DNA for each cell per replicate. We can see the raw data and the cells that were filtered by max features of 2700 and max mitochondrial DNA of 10%.Figure 3. QC plots of feature / RNA count and Percentage Mitochondrial DNA for each cell per replicate. We can see the raw data and the cells that were filtered by max features of 2700 and max mitochondrial DNA of 10%.Figure 3. QC plots of feature / RNA count and Percentage Mitochondrial DNA for each cell per replicate. We can see the raw data and the cells that were filtered by max features of 2700 and max mitochondrial DNA of 10%.Figure 3. QC plots of feature / RNA count and Percentage Mitochondrial DNA for each cell per replicate. We can see the raw data and the cells that were filtered by max features of 2700 and max mitochondrial DNA of 10%.Figure 3. QC plots of feature / RNA count and Percentage Mitochondrial DNA for each cell per replicate. We can see the raw data and the cells that were filtered by max features of 2700 and max mitochondrial DNA of 10%.Figure 3. QC plots of feature / RNA count and Percentage Mitochondrial DNA for each cell per replicate. We can see the raw data and the cells that were filtered by max features of 2700 and max mitochondrial DNA of 10%.Figure 3. QC plots of feature / RNA count and Percentage Mitochondrial DNA for each cell per replicate. We can see the raw data and the cells that were filtered by max features of 2700 and max mitochondrial DNA of 10%.Figure 3. QC plots of feature / RNA count and Percentage Mitochondrial DNA for each cell per replicate. We can see the raw data and the cells that were filtered by max features of 2700 and max mitochondrial DNA of 10%.

Figure 3. QC plots of feature / RNA count and Percentage Mitochondrial DNA for each cell per replicate. We can see the raw data and the cells that were filtered by max features of 2700 and max mitochondrial DNA of 10%.

Integration with Seurat

#### Standard seurat objects preparation before integration ####

# Iterating through objects to filter by QC threshold, normalize data and identify variable features. 

for (x in seq_along(object.list)) {
  object.list[[x]][['percent.mito']] <- PercentageFeatureSet(object.list[[x]],
                                                             pattern = "^mt-")

  x <- subset(object.list[[x]], subset = nFeature_RNA < 2700 & percent.mito < 10 )
  
# I tried to run SCTransform to normalise the data, it ran fine but clashed with the MAST test later on in the Differential Expression of cluster 0. # We would have used SCTransform to normalise the UMI data, this uses a negative binomial GLM using this equation, UMI counts ~ sequencing depth. This is a more tailored normalisation method for single cell data by Christoph Hafemeister & Rahul Satija (2019).
  
  # x <- SCTransform(x, vars.to.regress = "percent.mito",
                   #verbose = F)
  
  x <- NormalizeData(x,verbose=F)
  
  x <- FindVariableFeatures(x, selection.method = "vst",
                            nfeatures = 2000,
                            verbose=F)
}

# selecting features that are repeatedly variable across datasets for anchors
features <- SelectIntegrationFeatures(object.list = object.list,
                                      verbose=F)

#### Integration #####
# anchors to tie datasets together
int.anchors <- FindIntegrationAnchors(object.list = object.list,
                                      anchor.features = features,
                                      verbose=F)

# this command creates an 'integrated' seurat object
gammadelta <- IntegrateData(anchorset = int.anchors,
                            verbose=F)

# Removing anchors after integration to save memory
rm(int.anchors)

Standard Seurat pipeline

# Run the standard workflow for visualization and clustering
gammadelta <- gammadelta %>% 
  ScaleData(verbose = FALSE) %>%
  
# Here we run the two standard reduction of PCA and UMAP, I ran the PCA up to 50 components and ran the elbow plot to determine the adequate number of components. Here I settled on 20 components, as anything above that doesn't account for more variance. 
  RunPCA(npcs = 20, verbose = FALSE) %>% 

# I then ran the UMAP reduction using the PCA embeddings. We use UMAP instead of PCA for visualisations because it is a non-linear reduction. This helps digest the high-dimensional data from multiple PCA and the larger expression matrix generated from the 10X chromium method. We use this instead of t-SNE because of the better performance and that it converses the underlying structure of the data. These reduction will then being used to run the K-nearest neighbours alogrithm and then determine the clusters using the shared nearest neighbours clustering algorithm. 

  RunUMAP(reduction = 'pca', dims = 1:20, verbose=F) %>% 
  
  FindNeighbors(reduction = "umap", dims = 1:2, verbose=F) %>% 

# Due to the samples being only gamma-delta T cells, I used a low resolution parameter since the cells are not too different from one another. When I tried a resolution of 0.5, it created alot of small clusters that shared similar expression profiles. I lowered the resolution until I settled on a resolution of 0.1 that generated a resonable amount clusters. 
  
  FindClusters(resolution = 0.1, verbose=F)

Here we run the two standard reduction of PCA and UMAP, I ran the PCA up to 50 components and ran the elbow plot to determine the adequate number of components. Here I settled on 20 components, as anything above that doesn’t account for more variance.

I then ran the UMAP reduction using the PCA embeddings. We use UMAP instead of PCA for visualisations because it is a non-linear reduction. This helps digest the high-dimensional data from multiple PCAs and the larger expression matrix generated from the 10X chromium method. We use this instead of t-SNE because of the better performance and that it conserves the underlying structure of the data. These reduction will then be used to run the K-nearest neighbours alogrithm and then determine the clusters using the shared nearest neighbours clustering algorithm.

Due to the samples being only gamma-delta T cells, I used a low resolution parameter since the cells are not too different from one another. When I tried a resolution of 0.5, it created a lot of small clusters that shared similar expression profiles. I lowered the resolution until I settled on a resolution of 0.1 that generated a reasonable amount clusters.

UMAP plot

# UMAP Visualization

# saveRDS(gammadelta, file='gammadelta.rds')
# gammadelta <- readRDS('gammadelta.rds')

gammadelta.subset <- subset(gammadelta, seurat_clusters %in% c(0:8,12))

# Below I tried to re-cluster after subsetting the umap dimension data, I thought it find more defined sub-clusters but it did not seem to do much other than create tiny clusters

# ----------------------------------------------------------#
# # rerun clustering after subsetting to verify output
# gammadelta.subset <- gammadelta.subset %>% 
#   ScaleData(verbose = FALSE) %>% 
#   RunPCA(npcs = 15, verbose = FALSE) %>% 
# 
# # Explain reductions
#   RunUMAP(reduction = 'pca', dims = 1:15) %>% 
#   FindNeighbors(reduction = "umap", dims = 1:2) %>% 
# 
# # Explain resolution choice
#   FindClusters(resolution = 0.1)
# ----------------------------------------------------------#


umap <- DimPlot(gammadelta, reduction = "umap",
                label = T, repel = T) + 
  theme(axis.ticks = element_blank(), axis.text = element_blank())

# Here I selected clusters of an adequate size (< 100 cells), including a small cluster due to it looking like naive T cell (62L+ 44+ 122-)


# Renaming clusters
new.cluster.ids <- c("CD44+ IL17+",
                     "CD44- 62L+ 122+ IFN-G+",
                     "CD44- 62L+ 122+ IFN-G+",
                     "CD44+ IL17+",
                     "CD44+ IL17+ TcrV6+",
                     "CD44+ IL17+",
                     "Gzma+ Cytolytic T cell",
                     "CD44- 62L+ 122+ IFN-G+",
                     'Gzma+ Cytolytic T cell',
                     "CD44+ CD62L+ 122-")


names(new.cluster.ids) <- levels(gammadelta.subset)
gammadelta.subset <- RenameIdents(gammadelta.subset, new.cluster.ids)

# UMAP with reduced cluster noise
png("umap.png",width = 800,height = 800)
DimPlot(gammadelta.subset,
                       reduction = "umap", label = T, repel = T,
                       cols = brewer.pal(10, "Set3"),pt.size = 1,
                       split.by = 'Group') + 
  xlim(-10,17) + ylim(-10,17) + 
  labs(title = 'Seurat UMAP plot') + 
  theme(axis.ticks = element_blank(),
        axis.text = element_blank(),legend.position = 'none')
dev.off
## function (which = dev.cur()) 
## {
##     if (which == 1) 
##         stop("cannot shut down device 1 (the null device)")
##     .External(C_devoff, as.integer(which))
##     dev.cur()
## }
## <bytecode: 0x7fe966998ff8>
## <environment: namespace:grDevices>
knitr::include_graphics("umap.png")
Figure 4. Seurat UMAP plot. There are five annotated clusters: CD44- CD62L+ 122+ IFN-G+ CD44+ IL17+, CD44+ IL17+ TcrV6+, Gzma+ Cytolytic T cell, CD44+ 62L+ 122-. They have formed two distinct groups with the CD44+ 62L+ 122- cells being an outlier cluster. The UMAP is split by replicate Group to visualise the loss of the CD44+ group.

Figure 4. Seurat UMAP plot. There are five annotated clusters: CD44- CD62L+ 122+ IFN-G+ CD44+ IL17+, CD44+ IL17+ TcrV6+, Gzma+ Cytolytic T cell, CD44+ 62L+ 122-. They have formed two distinct groups with the CD44+ 62L+ 122- cells being an outlier cluster. The UMAP is split by replicate Group to visualise the loss of the CD44+ group.

Frequency plots

# Here I extract the cell counts and their cluster they belong to. I will use this to create plots of the fraction of cell that belong to each group per cluster.  
cells_per_cluster <- data.frame(table(Idents(gammadelta.subset),
                                            gammadelta.subset$Group)) %>%
  group_by(Var1) %>%
  mutate(freq_fraction = Freq/sum(Freq)) %>% 
  na.exclude()

frac_plot <- ggplot(cells_per_cluster, aes(y=Var1, x=freq_fraction, fill=Var2)) + 
  geom_col() +
  labs(y="Clusters", x="Fraction of cells",
       title="Fraction of cells per cluster",
       fill = 'Group') +
  scale_fill_manual(values=c("navyblue", "beige"))+
  theme_classic() +
  theme(legend.position = 'none')

# Here I made a plot that shows the amount of cells per cluster and the fraction of cells that belong to each group. I thought it might be a useful side plot to add more context. 
count_plot <- ggplot(cells_per_cluster, aes(y=Var1, x=Freq, fill=Var2)) +
  geom_col() +
    labs(y="", x="Cell Counts",
       title="Cell Counts per cluster",
       fill = 'Group') +
  scale_fill_manual(values=c("navyblue", "beige"))+
  theme_classic() +
  theme(axis.text.y = element_blank())

frac_plot + count_plot
Figure 5. Fraction of cells per cluster per group. In the first panel we have the clusters ranked by size of the cluster and the breakdown of the fraction of cells for each group. KO in navyblue and WT in the off-white. In the second panel we have cell counts per cluster per group.

Figure 5. Fraction of cells per cluster per group. In the first panel we have the clusters ranked by size of the cluster and the breakdown of the fraction of cells for each group. KO in navyblue and WT in the off-white. In the second panel we have cell counts per cluster per group.

In Figure 5, we can see the frequency (fraction) of cells that belong to each group per cluster. We can see that the clusters Gzma+ Cytolytic T cell and CD44- CD62L+ 122+ IFN-G+ have a near even fraction of cells between group. There is a slight bias towards the WT group since there are three WT replicates against 1 KO replicate. This should be considered in the rest of the analysis of this data.

It appears the Knock-out of β2 integrin (CD18) has increased the proliferation of CD44- CD62L+ 122+ IFN-G and Gzma+ Cytolytic T cell clusters or decreased the proliferation of CD44+ IL17+ and CD44+ IL17+ TcrV6+ clusters. The importance of CD18 T cell development and phenotypic divergence has been discussed in Verma and Kelleher, 2017. This could be preliminary evidence that CD18 has an effect on T cell development in these specialised γδ T cells.

The CD44+ CD62L+ 122- cluster shows a naive T cell expression profile, although they are mostly present in the KO group, so this a weak annotation although possibly interesting if more KO replicates are added to the analysis.

Integration using Harmony

# Here we are using an alternative integration method, Harmony (Korsunsky et al, 2019). The authors claim that it is a superior integration method compared to others, such as the Seurat method. We will run the integration with Harmony and run the differential expression of the largest cluster and compare the results to differential expression after the Seurat Integration.

# CreateSeuratObject loop, this allows us to use the 10X chromium single cell data to be analysed in the Seurat package. Once created, they can be integrated together.

object.list <- list(KO1 = KO1, WT1 = WT1, WT2 = WT2, WT3 = WT3)

for (sample in names(object.list)) {
  # Creating a Seurat object of each 10X dataset
  object.list[[sample]] <- CreateSeuratObject(counts = object.list[[sample]],
                                           project = sample,
                                           min.cells = 3, 
                                           min.features = 0)
  
  # Adding Group samples to metadata
  object.list[[sample]]$Group <- substr(sample, 1, nchar(sample)-1)

}

# Extracting the seurat objects
KO1 <- object.list[[1]]
WT1 <- object.list[[2]]
WT2 <- object.list[[3]]
WT3 <- object.list[[4]]

# Merging all datasets to be integrated 
gammadelta.harmony <- merge(x= KO1, y= c(WT1, WT2, WT3),
                            add.cell.ids = c("KO", "WT", "WT","WT"),
                            project = 'gammadelta harmony')

# Standard Seurat pipeline
gammadelta.harmony <- NormalizeData(gammadelta.harmony,
                                    verbose = F)
gammadelta.harmony <-  FindVariableFeatures(gammadelta.harmony,
                                            selection.method = "vst",
                                            nfeatures = 2000,
                                            verbose = F) 
gammadelta.harmony <-  ScaleData(gammadelta.harmony,
                                 verbose = F) 
gammadelta.harmony <-  RunPCA(gammadelta.harmony,
                              npcs = 30,
                              verbose = F) 
gammadelta.harmony <-  RunUMAP(gammadelta.harmony,
                               reduction = "pca",
                               dims = 1:30,
                               verbose = F) 

# Running Harmony integration
gammadelta.harmony <-  gammadelta.harmony %>% RunHarmony("orig.ident",
                                  plot_convergence = T)
# Run the standard workflow for visualization and clustering
gammadelta.harmony <- gammadelta.harmony %>% 
  ScaleData(verbose = FALSE) %>% 
  
# Using the same reductions and parameters for a better comparaison
  RunPCA(npcs = 20, verbose = FALSE) %>% 
  RunUMAP(reduction = 'pca', dims = 1:20) %>% 
  FindNeighbors(reduction = "umap", dims = 1:2) %>% 
  FindClusters(resolution = 0.1, verbose = F)

Differential Expression of Cluster CD44+ IL17+

# Performing DE between KO and WT for cells belong to cluster CD44+ IL17+, as it the largest cluster that contains both groups, performing DE using default Wilcox Rank Sum and MAST for comparaison of best method. Filtering for genes that have a significant p.adjusted value to account for multiple correction

# Preparing SCT data for DE
# gammadelta.subset <- PrepSCTFindMarkers(gammadelta.subset)

#### (d) Seurat ####

# subsetting for largest cluster
cluster0 <- subset(gammadelta, seurat_clusters %in% c(0))

# DE by wilcox and MAST by group
DE.wilcox <- FindMarkers(cluster0, ident.1 = 'KO', ident.2 = 'WT',
                         group.by = 'Group', assay = 'RNA',verbose = F) %>% 
            filter(p_val_adj !=0 & p_val_adj < 0.01 & avg_log2FC > 0.5)

DE.mast <- FindMarkers(cluster0,
                       ident.1 = 'KO', ident.2 = 'WT', group.by = 'Group',
                       assay = 'RNA', test.use = 'MAST',verbose = F) %>% 
            filter(p_val_adj !=0 & p_val_adj < 0.01 & avg_log2FC > 0.5)

#### (e) HARMONY ####

# subsetting for largest cluster
cluster0 <- subset(gammadelta.harmony, seurat_clusters %in% c(0))

# DE by wilcox and MAST by group
DE.harmony.wilcox <- FindMarkers(cluster0, ident.1 = 'KO', ident.2 = 'WT',
                         group.by = 'Group', assay = 'RNA',verbose = F) %>% 
            filter(p_val_adj !=0 & p_val_adj < 0.01 & avg_log2FC > 0.5)

DE.harmony.mast <- FindMarkers(cluster0,
                       ident.1 = 'KO', ident.2 = 'WT', group.by = 'Group',
                       assay = 'RNA', test.use = 'MAST',verbose = F) %>% 
            filter(p_val_adj !=0 & p_val_adj < 0.01 & avg_log2FC > 0.5)


#### Venn Diagram ####
library(VennDiagram)

venn.diagram(x = list(rownames(DE.wilcox), rownames(DE.mast),
                   rownames(DE.harmony.wilcox), rownames(DE.harmony.mast)),
             category.names = c("DE Wilcox" , "DE Mast " ,
                                "DE Wilcox (Harmony)","DE Mast (Harmony)"),
             filename = 'sVenn.png',
             lwd = 2,
             fill=brewer.pal(4, "Paired"),
             imagetype = 'png')
## [1] 1
knitr::include_graphics("sVenn.png")
Figure 6. Venn diagram of DE upregulated gene counts from Wilcox Rank Sum and MAST test on the largest cluster CD44+ IL17+, from Seurat and Harmony integrated datasets, to compare the DE test. Upregulated significant genes determined using thresholds of logfold > 0.5, p.adjust < 0.01.

Figure 6. Venn diagram of DE upregulated gene counts from Wilcox Rank Sum and MAST test on the largest cluster CD44+ IL17+, from Seurat and Harmony integrated datasets, to compare the DE test. Upregulated significant genes determined using thresholds of logfold > 0.5, p.adjust < 0.01.

In the Venn Diagram, we can see that the number of upregulated gene (logfold > 0.5, p.adjust < 0.01) per differential expression (DE) test. We ran both a Wilcox Rank Sum and MAST on to compare, and clusters from two integration methods. The first being the standard Seurat method and the second being the Harmony method developed by Korsunsky et al., (2019).

The DE was done on the largest cluster, CD44+ IL17+, comparing the KO against the WT. The DE on the Seurat integrated cluster found 186 upregulated genes for both the Wilcox and MAST test. Although, one of the genes were unique to each test, they shared 184 genes in common, 18 of which were shared with the Harmony integration DE tests.

The Harmony integration DE tests only found 31 up regulated genes, 13 of which were unique to the Harmony integration cluster. It would seem that the integration method had a greater effect on the determination of up-regulated genes, however I suspect that this could be a clustering error on my part.

In terms of the two DE tests, it would seem that both methods are comparable in results, with the only distinguishing feature being the difference in p adjusted values.

Task 4 - Trajectory analysis

# PHATE projection

gammadelta.subset.wt <- subset(gammadelta, seurat_clusters %in% c(0:8)) %>% 
  subset(Group %in% c('WT'))

# Adding cluster names to WT subset seurat object
names(new.cluster.ids) <- levels(gammadelta.subset.wt)
gammadelta.subset.wt <- RenameIdents(gammadelta.subset.wt, new.cluster.ids)

# Running PHATE on WT only Seurat 
gammadelta.subset.wt <- RunPHATE(gammadelta.subset.wt)

## Alternate phate method ##
# phate <- phate(t(GetAssayData(gammadelta.subset.wt)))
# 
# 
# gammadelta.subset.wt@reductions$phate <- CreateDimReducObject(embeddings = phate[["embedding"]],
#                                                         key = 'phate',
#                                                         assay = 'RNA')


gammadelta.sce <- as.SingleCellExperiment(gammadelta.subset.wt, assay = 'RNA')

# getting clusters for slingshot trajectory
colData(gammadelta.sce)$Seurat_clusters <- as.character(gammadelta.subset.wt@active.ident)

# Slingshot using umap reduction
# gammadelta.sce <- slingshot(gammadelta.sce,
#                   clusterLabels = 'Seurat_clusters', reducedDim = 'UMAP')

# Slingshot using phate reduction
gammadelta.sce <- slingshot(gammadelta.sce,
                            clusterLabels = 'seurat_clusters',
                            reducedDim = 'PHATE')


# Extracting trajectory curve data 
phate.sling <- SlingshotDataSet(gammadelta.sce)

# VISUALISATIONS

# unique cluster names from active ident
cluster_names = as.character(unique(gammadelta.subset.wt@active.ident))

# Setting colour for clusters
colour = brewer.pal(12, "Set3")[gammadelta.sce$ident]


# PHATE reduction plot
# png("phate.png",width = 800,height = 800)
plot(reducedDims(gammadelta.sce)$PHATE, col = colour, pch= 16, cex=1)
lines(phate.sling@curves$Lineage1, col='black') 

legend('topright',cluster_names,
       fill = c("#FFFFB3","#8DD3C7","#BEBADA", "#FB8072"),
       inset = c(0, 0)) 

# # Custer names
# text(c(-0.015,0.025,0.02,-0.015,1),
#      c(-0.015,-0.02,0.02,0.03,1),
#      labels = cluster_names, cex = 0.8, adj = 0.5)

title(main='Slingshot PHATE Trajectory Plot') 

dev.off()
## quartz_off_screen 
##                 3
knitr::include_graphics("phate.png")
Figure 7. Slingshot Trajectory Plot using PHATE reduction method. PHATE embeddings were determined using the RunPHATE function and the PCA dimensions generated in the standard seurat pipeline. A slingshot trajectory between the clusters using the slingshot function using the PHATE reduction.

Figure 7. Slingshot Trajectory Plot using PHATE reduction method. PHATE embeddings were determined using the RunPHATE function and the PCA dimensions generated in the standard seurat pipeline. A slingshot trajectory between the clusters using the slingshot function using the PHATE reduction.

# UMAP reduction plot

# 
# # UMAP plot parameters
# plot(reducedDim(gammadelta.sce,
#                 type = 'UMAP'),
#      pch=20, cex=0.5,
#      col= brewer.pal(12, "Set3")[gammadelta.subset.wt@active.ident])
# #lines(SlingshotDataSet(gammadelta.sce,))
# text(c(-5,-5,2,5,8),
#      c(-8,3,8,-3,12),
#      labels = cluster_names, cex = 0.8, pos = 3)
# title(main='Slingshot UMAP Trajectory Plot')

In the Trajectory plot, our clusters have been projected onto PHATE space using the phateR package developed by Moon et al., 2019. Much like PCA and UMAP, where they digest high-dimensional single cell data to lower dimension embeddings for better visualisation. Where PHATE differs from the other two, is that it was developed specifically for single cell data to capture the nonlinear structure of the data points and handle the noise effectively.

For our trajectory, we can speculate that it runs from the CD44+ IL-17+ cluster and ends in the CD44- 62L+ 122+ IFN-G+, passing through TcrV6+ and Gzma+ Cytolytic T cells. The speculation stems from the fact that the Gzma+ Cytolytic T cell and CD44- 62L+ 122+ IFN-G+ clusters are both greatly reduced in the KO group, as seen in the UMAP Figure 4. Perhaps CD18 KO results in the arrested development of Gzma+ Cytolytic T cell and CD44- 62L+ 122+ IFN-G+ clusters, and that it is a key component the differentiation of these γδ T cells.

We could have seen a greater curve to our PHATE projected cells perhaps if we had created 7 to 8 clusters instead of the 4 large clusters we settled on. However, we settled on these clusters because it was difficult to find clear gene markers for each clusters that were related to the biology of T cells.

Task 5 - Discussion

Overall, the γδ T cell experiment had a solid design in terms of testing groups, although the uneven number of replicates in the groups could be improved in the future. By adding more KO biological replicates, we would be able to define clearer clusters and find more DE genes. Ensuring that the replicates have an equal amount of cells among them since WT1 had far fewer cells compared to rest and could have lead to a bias in the results.

Perhaps including other related T cell subtypes in the 10X chromium experiment could mean that we find the effect of CD18 KO on those subtypes and perhaps see if similar genes are up-regulated as well.

References

Butler, A., Hoffman, P., Smibert, P., Papalexi, E. and Satija, R., 2018. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nature biotechnology, 36(5), pp.411-420.

Carver, T., Harris, S.R., Berriman, M., Parkhill, J. and McQuillan, J.A., 2012. Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data. Bioinformatics, 28(4), pp.464-469.

Hafemeister, C. and Satija, R., 2019. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome biology, 20(1), p.296.

Korsunsky, I., Millard, N., Fan, J., Slowikowski, K., Zhang, F., Wei, K., Baglaenko, Y., Brenner, M., Loh, P.R. and Raychaudhuri, S., 2019. Fast, sensitive and accurate integration of single-cell data with Harmony. Nature methods, 16(12), pp.1289-1296.

McIntyre, C.L., Monin, L., Rop, J.C., Otto, T.D., Goodyear, C.S., Hayday, A.C. and Morrison, V.L., 2020. β2 Integrins differentially regulate γδ T cell subset thymic development and peripheral maintenance. Proceedings of the National Academy of Sciences, 117(36), pp.22367-22377.

Moon, K.R., van Dijk, D., Wang, Z., Gigante, S., Burkhardt, D.B., Chen, W.S., Yim, K., Elzen, A.V.D., Hirn, M.J., Coifman, R.R. and Ivanova, N.B., 2019. Visualizing structure and transitions in high-dimensional biological data. Nature biotechnology, 37(12), pp.1482-1492.